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Abstract 

We study the stochastic versions of a broad class of combinatorial problems where the weights of the 
elements in the input dataset are uncertain. The class of problems that we study includes shortest paths, 
minimum weight spanning trees, and minimum weight matchings over probabiUstic graphs, and other 
combinatorial problems like knapsack. We observe that the expected value is inadequate in capturing 
different types of risk-averse or risk-prone behaviors, and instead we consider a more general objective 
which is to maximize the expected utility of the solution for some given utility function, rather than the 
expected weight (expected weight becomes a special case). We show that we can obtain a polynomial 
time approximation algorithm with additive error e for any e > 0, if there is a pseudopolynomial time 
algorithm for the exact version of the problem (This is true for the problems mentioned above) and the 
maximum value of the utility function is bounded by a constant. ' Our result generalizes several prior 
results on stochastic shortest path, stochastic spanning tree, and stochastic knapsack. Our algorithm for 
utility maximization makes use of the separabihty of exponential utility and a technique to decompose 
a general utility function into exponential utihty functions, which may be useful in other stochastic 
optimization problems. 



1 Introduction 

The most common approach to deal with optimization problems in presence of uncertainty is to optimize 
the expected value of the solution. However, expected value is inadequate in expressing diverse people's 
preferences towards decision-making under uncertain scenarios. In particular, it fails at capturing different 
risk-averse or risk-prone behaviors that are commonly observed. Consider the following simple example 
where we have two lotteries Li and L2. In Li, the player could win 1000 dollars with probability 1.0, 
while in L2 the player could win 2000 dollars with probability 0.5 and dollars otherwise. It is easy to 
see that both have the same expected payoff of 1000 dollars. However, many, if not most, people would 
treat Li and L2 as two completely different choices. Specifically, a risk-averse player is likely to choose Li 
and a risk-prone player may prefer L2 (Consider a gambler who would like to spend 1000 dollars to play 
double-or-nothing). A more involved but also more surprising example is the St. Petersburg paradox (see 

* lapordge@gmail.com 
^ amol @ cs.umd.edu 

'Following the literature [46], we differentiate between exact version and deterministic version of a problem; in the exact version 
of the problem, we are given a target value and asked to find a solution (e.g., a path) with exactly that value (i.e., path length). 
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e.g., [37, 1]) which has been widely used in the economics literature as a criticism of expected value. See 
Appendix A for a brief description of the St. Petersburg paradox. These observations and criticisms have 
led researchers, especially in Economics, to study the problem from a more fundamental perspective and 
to directly maximize user satisfaction, often called utility. The uncertainty present in the problem instance 
naturally leads us to optimize the expected utility. 

Let F be the set of feasible solutions to an optimization problem. Each solution S* € is associated 
with a random weight w{S). For instance, T could be a set of lotteries and w{S) is the (random) payoff of 
lottery S. We model the risk awareness of a user by a utility function : M ^ M: the user obtains /x(a;) 
units of utility if the outcome is x, i.e., w{S) = x. Formally, the expected utility maximization principle is 
simply stated as follows: the most desirable solution S is the one that maximizes the expected utiUty, i.e., 

S = argmaxE[/x(«;(SO)] 

Indeed, the expected utility theory is a branch of the utility theory that studies "betting preferences" of 
people with regard to uncertain outcomes (gambles). The theory was formally initiated by von Neumann 
and Morgenstem in 1940s [55, 21]^ who gave an axiomatization of the theory (known as von Neumann- 
Morgenstem expected utility theorem). The theory is well known to be versatile in expressing diverse risk- 
averse or risk-prone behaviors. 

In this paper, we focus on the following broad class of combinatorial optimization problems. The de- 
terministic version of the problem has the following form: we are given a ground set of elements U = 
{ej}j=i...„; each element e is associated with a weight We', each feasible solution is a subset of the elements 
satisfying some property. Let denote the set of feasible solutions. The objective for the deterministic 
problem is to find a feasible solution S with the minimum total weight w{S) = J2e&s ^e- We can see 
that many combinatorial problems such as shortest path, minimum spanning tree, and minimum weight 
matching belong to this class. In the stochastic version of the problem, the weight Wf. of each element e is 
a nonnegative random variable. We assume all WeS are independent of each other. We use Pe(-) to denote 
the probability density function for We (or probability mass function in discrete case). We are also given a 
utility function n : M+ M+ which maps a weight value to a utility value. By the expected utility max- 
imization principle, our goal here is to find a feasible solution S £ T that maximizes the expected utility, 
i.e., ¥,[fi{w{S))]. We call this problem the expected utility maximization (EDM) problem. 

Let us use the following toy example to illustrate the rationale behind EDM. There is a graph with two 
nodes s and t and two parallel links e\ and 62. Edge ei has a fixed length 1 while the length of e2 is 0.9 with 
probability 0.9 and 1.9 with probability 0.1 (the expected value is also 1). We want to choose one edge to 
connect s and t. It is not hard to imagine that a risk-averse user would choose ei since £2 may turn out to 
be a much larger value with a nontrivial probability. We can capture such behavior using the utility function 
(I) (defined in Section I.I). Similarly, we can capture the risk-prone behavior by using, for example, the 
utility function ^i{x) = It is easy to see that ei maximizes the expected utility in the former case, and 
62 in the latter. 

1.1 Our Contributions 

We discuss in detail our result for EU M. We assume is part of the specification of the problem but not part 
of the input. Moreover, we assume lim-E_).oo fJ-ix) = 0. This captures the fact that if the weight of solution is 
too large, it becomes almost useless for us. W.I.o.g. we can also assume < fi{x) < 1 for a; > 0, by scaling. 

^Daniel Bernoulli also developed many ideas, such as risk aversion and utility, in his work Specimen theoriae novae de mensura 
sortis (Exposition of a New Theory on ttie Measurement of Risk) in 1738 [8]. 
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Figure 1: (1) The utility function x{x)y ^ continuous variant of the threshold function x{^)'^ (2) A smoother 
variant of x{^)'^ 0) The utility function X2{x), a continuous variant of the 2-d threshold function X2{x)- 

We say a function Jl{x) is an e-approximation of ^{x) if \'jl{x) — ijl{x)\ < eMx > 0. For ease of exposition, 
we let ]l{x) be a complex function. Recall that a polynomial time approximation scheme (PTAS) is an 
algorithm which takes an instance of a minimization problem and a parameter e and produces a solution 
whose cost is within a factor 1 + e of the optimum, and the running time, for any fixed e, is polynomial 
in the size of the input. We use A to denote the deterministic combinatorial optimization problem under 
consideration. The exact version of a problem A asks the question whether there is a feasible solution of A 
with weight exactly equal to a given number K. We say an algorithm runs in pseudopolynomial time for the 
exact version of A if the running time is polynomial in n and K. Our first main theorem is the following. 

Theorem 1 Assume that there is a pseudopolynomial algorithm for the exact version of A. Further assume 
that given any e > 0, we can find an e-approximation of the utility function fi as Jl{x) = "^j^-i Ck(t>f., where 
L is a constant and |^!>fc| < IVfc; 0^ may be complex numbers. Then, there is an algorithm that runs in time 
(n/e)^^^^ that approximates EUM(^) with an additive error 0(e). If the optimal expected utility is 0(1), 
we obtain a PTAS. 

For many combinatorial problems, a pseudopolynomial algorithm for the exact version is known. Exam- 
ples include shortest path, spanning tree, matching and knapsack. Hence, the only task left is to find a short 
exponential sum that e-approximates //. For this purpose, we adopt the Fourier series technique. However, 
the technique cannot be used directly since it works only for periodic functions with bounded periodicities. 
In order to get a good approximation for x G [0, oo), we leverage the fact that lima;^oo p{x) = and de- 
velop a general framework that uses the Fourier series decomposition as a subroutine. Generally speaking, 
such an approximation is only possible if the function is "well behaved", i.e., it satisfies some continuity or 
smoothness conditions. In particular, we prove Theorem 2. We say that the utility function fj, satisfies the 
a-Holder condition if |^(a:) — /i(y)| < C \x — for some constant C and some constant a. 

Theorem 2 If /j, satisfies the a-Holder condition for some constant a > 1/2, then, for any e > 0, we can 
obtain an exponential sum with 0{poly{^)) terms which is an e-approximation of ^ for x > 0. 

Consider the utility function 



where (5 > is a small constant (See Figure 1(1)). We can verify that x satisfies l-Holder condition with 
C = J. Therefore, Theorem 2 is applicable. This example is interesting since it can be viewed as a 
continuous variant of the threshold function 




X G [0, 1] 

xe[l,l + S] 
x> 1 + S 



(1) 




X £ [0, 1] 



(2) 
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for which maximizing the expected utility is equivalent to maximizing Pr{w{S) < 1). This special case 
has been considered several times in literature for various problems including stochastic shortest path [43], 
stochastic spanning tree [30, 23], stochastic knapsack [24] and some other stochastic problems [3, 41]. 

It is interesting to compare our result with the result for the stochastic shortest path problem considered 
by Nikolova et al. [43, 41]. In [43], they show that there is an exact 0(n'°^") time algorithm for maximizing 
the probability that the length of the path is at most 1, i.e., Pv{w{S) < 1), assuming all edges are normally 
distributed and there is a path with its mean at most 1. Later, Nikolova [41] extends the result to an FPTAS for 
any problem under the same assumptions, if the deterministic version of the problem has a polynomial time 
exact algorithm. We can see that under such assumptions, the optimal probability is at least 1/2? Therefore, 
provided the same assumption and further assuming that PT{we < 0) is miniscule,'^ our algorithm is a PTAS 
for the continuous variant of the problem. Indeed, we can translate this result to a bi-criterion approximation 
result of the following form: for any fixed d,e> 0, we can find in polynomial time a solution S such that 

Fi{w{S) <l + 5)>{l-e) Piiw{S*) < 1). 

where S* is the optimal solution (Corollary 2). We note that such a bi-criterion approximation was only 
known for exponentially distributed edges before [43]. 

Let us consider another application of our results to the stochastic knapsack problems defined in [24]. 
Given a set U of independent random variables {xi, . .. , Xn}, with associated profits {vi, . . . , Vn} and an 
overflow probability 7, we are asked to pick a subset SofU such that 

Pr(^x, > 1) <7 
ieS 

and the total profit J2i&s maximized. Goel and Indyk [24] showed that, for any e > 0, there is a 
polynomial time algorithm that can find a solution S with the profit as least the optimum and Pr(X^jg^ xi > 
1 + e) < 7(1 + e) for exponentially distributed variables. They also gave a quasi-polynomial time approxi- 
mation scheme for Bernoulli distributed random variables. Quite recently, in parallel with our work, Bhalgat 
et al. [13] obtained the same result for arbitrary distributions under the assumption that 7 = 6(1). Their 
technique is based on discretizing the distributions and is quite involved. Our result, applied to stochastic 
knapsack, matches that of Bhalgat et al. We remark that our algorithm is much simpler and has a much 
better running time (Theorem 5). Despite a little loss in the approximation guarantees in some cases, our 
technique can be appUed to almost all positive probability distributions, and a much richer class of utility 
functions. 

Equally importantly, we can extend our basic approximation scheme to handle generalizations such as 
multiple utility functions and multidimensional weights. Interesting applications of these extensions include 
generaUzations of stochastic knapsack, such as stochastic multiple knapsack (Theorem 8) and stochastic 
multidimensional knapsack (stochastic packing) (Theorem 9). 

1.2 Related Work 

In recent years stochastic optimization problems have drawn much attention from the computer science com- 
munity and stochastic versions of many classical combinatorial optimization problems have been studied. 
In particular, a significant portion of the efforts has been devoted to the two-stage stochastic optimization 

^The sum of multiple Gaussians is also a Gaussian. Hence, if we assume the mean of the length of a path (which is a Gaussian) 
is at most 1, the probability that the length of the path is at most 1 is at least 1/2. 

'^Our technique can only handle distributions with positive supports. Thus, we have to assume that the probabiUty that a negative 
value appears is miniscule and can be safely ignored. 
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problem. In such a problem, in a first stage, we are given probabilistic information about the input but the 
cost of selecting an item is low; in a second stage, the actual input is revealed but the costs for the elements 
are higher. We are asked to make decision after each stage and minimize the expected cost. Some general 
techniques have been developed [28, 50]. We refer interested reader to [54] for a comprehensive survey. 
Another widely studied type of problems considers designing adaptive probing policies for stochastic opti- 
mization problems where the existence or the exact weight of an element can be only known upon a probe. 
There is typically a budget for the number of probes (see e.g., [27, 18]), or we require an irrevocable deci- 
sion whether to include the probed element in the solution right after the probe (see e.g., [20, 16, 5, 19, 13]). 
However, most of those works focus on optimizing the expected value of the solution. There is also sporadic 
work on optimizing the overflow probability or some other objectives subject to the overflow probability 
constraints. In particular, a few recent works have explicitly motivated such objectives as a way to capture 
the risk-averse type of behaviors [3, 41, 53]. Besides those works, there has been little work on optimizing 
more general utility functions for combinatorial stochastic optimization problems from an approximation 
algorithms perspective. 

The most related work to ours is the stochastic shortest path problem (Stoch-SP), which was also the 
initial motivation for this work. The problem has been studied extensively for several special utility functions 
in operation research community. Sigal et al. [51] studied the problem of finding the path with greatest 
probability of being the shortest path. Loui [36] showed that Stoch-SP reduces to the shortest path (and 
sometimes longest path) problem if the utility function is linear or exponential. Nikolova et al. [42] identified 
more specific utility and distribution combinations that can be solved optimally in polynomial time. Much 
work considered dealing with more general utility functions, such as piecewise linear or concave functions, 
e.g., [39, 40, 7]. However, these algorithms are essentially heuristics and the worst case running times are 
still exponential. Nikolova et al. [43] studied the problem of maximizing the probability that the length of the 
chosen path is less than some given parameter. Besides the result we mentioned before, they also considered 
Poisson and exponential distributions. Despite much effort on this problem, no algorithm is known to run 
in polynomial time and have provable performance guarantees, especially for more general utility functions 
or more general distributions. This is perhaps because the hardness comes from different sources, as also 
noted in [43] : the shortest path selection per se is combinatorial; the distribution of the length of a path is 
the convolution of the distributions of its edges; the objective is nonlinear; to list a few. 

Kleinberg et al. [32] first considered the stochastic knapsack problem with Bernoulli-type distribu- 
tions and provided a polynomial- time 0(log I/7) approximation where 7 is the given overflow probability. 
For item sizes with exponential distributions, Goel and Indyk [24] provided a bi-criterion PTAS, and for 
Bernoulli-distributed items they gave a quasi-polynomial approximation scheme. Chekuri and Khanna [15] 
pointed out that a PTAS can be obtained for the Bernoulli case using their techniques for the multiple knap- 
sack problem. Goyal and Ravi [26] showed a PTAS for Gaussian distributed sizes. Quite recently, Bhalgat, 
Goel and Khanna [13] developed a general discretizaton technique that reduces the distributions to a small 
number of equivalent classes which we can efficiently enumerate for both adaptive and nonadaptive versions 
of stochastic knapsack. They used this technique to obtain improved results for several variants of stochastic 
knapsack, notably a bi-criterion PTAS for the adaptive version of the problem. Dean at al. [20] gave the first 
constant approximation for the adaptive version of stochastic knapsack. The adaptive version of stochas- 
tic multidimensional knapsack (or equivalently stochastic packing) has been considered in [19, 13] where 
constant approximations and a bi-criterion PTAS were developed. 

This work is partially inspired by our prior work on top-k and other queries over probabilistic datasets [33, 
35]. In fact, we can show that both the consensus answers proposed in [33] and the parameterized ranking 
functions proposed in [35] follow the expected utility maximization principle where the utility functions 
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are materialized as distance metrics for the former and the weight functions for the latter. Our technique 
for approximating the utility functions is also similar to the approximation scheme used in [35] in spirit. 
However, no performance guarantees are provided in that work. 

There is a large volume of work on approximating functions using short exponential sums over a 
bounded domain, e.g., [45, 9, 10, 11]. Some works also consider using linear combinations of Gaussians or 
other kernels to approximate functions with finite support over the entire real axis (— oo, +oo) [17]. This 
is however impossible using exponentials since is either periodic (if |a| = 1) or approaches to infinity 
when X +oo or a; ^ — oo (if \a\ / 1). 

2 Algorithm 

We first note that EDM is #P-hard in general since the problem of computing the overflow probability of a 

set of items with Bernoulli distributions, a very special case of our problem, is #P-hard [32]. 

Our approach is very simple. We first observe that the problem is easy if the utility function is an 
exponential function. We approximate the utility function ii{x) by a short exponential sum, i.e., Yl^=i ^i^f 
with L being a constant (cj and (f)i may be complex numbers). Hence, E[/x(u;(5'))] can be approximated by 
Yli=i Then, we consider the following multi-criterion version of the problem with L objectives 

{E[0^'''''''']},,=i l: given L complex numbers vi, . . . , vl, we want to find a solution S such that E[(^^^'^^] 

for i = 1, . . . , L. We achieve this by utilizing the pseudopolynomial time algorithm for the exact version 
of the problem. We argue that we only need to consider a polynomial number of vi, . . . ,vl combinations 
(which we call configurations) to find out the approximate optimum. In Section 2.1, we show how to 
solve the multi-criterion problem provided that a short exponential sum approximation of /i is given. In 
particular, we prove Theorem 1. Then, we show how to approximate fi hy a short exponential sum by 
proving Theorem 2 in Section 2.2 and Section 2.3. 

Let us first consider the exponential utiUty function fi{x) = for any a G C. Fix an arbitrary solution 
S and a > 0. Due to the independence of the elements, we can see that 

E[a"'(^)] = E[a^-es»e] = a""^] = ]jE[a"'=] 

eeS eeS 

Taking log on both sides, we get logE[a"''^'^)] = Z^ee5 ^^S^I'^'""]- If « is a positive real number and 
E[a"'«] < 1 (or equivalently, — logE[a"'«] > 0), this reduces to the deterministic optimization problem. 

We still need to show how to compute E[q!"'^]. If We is a discrete random variable with a polynomial 
size support, we can easily compute E in polynomial time. If Wg has an infinite discrete or continuous 
support, we can not compute E[a"''=] directly and may need to approximate it. We briefly discuss this issue 
and its implications on our results in Appendix C. 

2.1 Proof of Theorem 1 

Now, we prove Theorem 1. We start with some notations. We use |c| and arg(c) to denote the absolute 
value and the argument of the complex number c, respectively. In other words, c = jcj(cos(arg(c)) + 
isin(arg(c)))) = \c\e'' We always require arg(c) G [0, 27r) for any c G C. Recall that we say the 
exponential sum X^^^ Cicpf is an e-approximation for ijl{x) if the following holds: 

L 

\fi{x)-J2(H^f\<^ Vx>0 

i=l 
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We first show that if the utility function can be decomposed exactly into a short exponential sum, we can 
approximate the optimal expected utiUty well. 

Theorems Assume Ji{x) = Ylk=i^k(t^k utility function where \(j)k\ < ^far 1 < k < L. We also 

assume that there is a pseudopolynomial algorithm for the exact version of A. Then, for any e > 0, there is 
an algorithm that runs in time (n/ e)*^^^'* and finds a solution S such that 

|E[MK5))]-E[AlMS))]|<e 

where S = argmax5' |E[/I(i(;(S"))|. 

We use the scaling and rounding technique that has been used often in multi-criterion optimization 
problems (e.g., [49, 46]). Since our objective function is not additive and not monotone, the general results 
for multi-criterion optimization [46, 38, 49, 2] do not directly apply here. We briefly sketch our algorithm. 
Let 7 = (5 = For each e G ?7, we associate it with a 2L dimensional integer vector 

(ai(e),6i(e),...,aL(e),6i(e)) where a, (e) = [Zl^LMfflj ^^^^(e) = L^^iM^J. 

7 

ai{e) and hi{e) are the scaled and rounded versions of — In and arg(E[0^'=]), respectively. Since 

I*?!"*! < 1> we can see that ai{e) > for any e e U. We maintain (JK)^ configurations where J = 
|--iii^/L)-| ^j^^j^ — 1"?^]. The number of configurations is {n/e)^^^\ Each configuration cr (a) is indexed 
by a 2L-dimensional vector a = (ai, Pi, ... , a^, Pl) where 1 < ai < J and I < f3i < K for i = 
1, . . . ,L. In other words, the configurations are 1, . . . , 1, 1)), . . . , a{{J, K, . . . ,J, K))). For vector 
a= (q!i,/3i, . . . ,aL,/3L)5 configuration(j(a) = 1 if and only if there is a feasible solution 5 gJ^ such that 
for all J = 1, . . . , L, Pj = X^g£5 bj{e), and aj = min(J, Y^f,^s Otherwise, cr(a) = 0. Lemma 1 

tells us the expected utility for the rounded instance is close to the true value of the expected utility. Lemma 2 
shows we can compute those configurations in polynomial time. 

Lemma 1 For vector a = (ai, . . . , ol, Pl), Oy(a) = 1 if and only if there is a solution S such that 

L 
k=l 

Lemma 2 Suppose there is a pseudopolynomial time algorithm for the exact version of A, which runs in 
time polynomial in n and t(tis the maximum integer in the instance of A). Then, we can compute the values 
for these configurations in time (^)^(^). 

The missing proofs can be found in Appendix B. Now, we can easily prove Theorem 3. 

Proof of Theorem 3: We first use the algorithm in Lemma 2 to compute the values for all configurations. 
Then, we find the configuration (7((ai, . . . , ckl, Pl)) that has value 1 and that maximizes the quantity 
I X^fc=i Cfee~°'='^+*'''=''j. The feasible solution 5 corresponding to this configuration is our final solution. It is 
easy to see that the theorem follows from Lemma 1 . □ 

Theorem I can be readily obtained from Theorem 3 and the fact Jl is an e-approximation of p. 
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Proof of Theorem 1: Suppose S is our solution and S* is the optimal solution for utility function /x. From 
Theorem 3, we know that |E[/I(iu(S'))] > E['jl{w{S*))] | — e. Since Jl is an e-approximation of ^, we can see 
that 



< 



eps{x)dx 



< e 



\E[fi{wiS))]-E[iliwiS))]\ = J {iiix)-Jl{x))ps{x)dx 

for any solution S, where ps is the probabiUty density function of S. Therefore, we have 

nfi{w{Sm > njl{w{Sm - e > \E[Jl{w{S*m - 2e > \E[fi{w{S*m - Se 
The proof is complete. 



□ 



2.2 Approximating the Utility Function 

In this subsection, we discuss the issue of approximating /i. In particular, we develop a generic algorithm 
that takes as a subroutine an algorithm AP for approximating functions in a bounded interval domain, and 
approximates //(x) in the infinite domain [0, +oo). In the next subsection, we use the Fourier series expan- 
sion as the choice of AP and show that important classes of utility functions can be approximated well. 

There are many works on approximating functions using short exponential sums, e.g., the Fourier de- 
composition approach [52], Prony's method [45], and many others [9, 10]. However, their approximations 
are done over a finite interval domain, say [— tt, tt] or over a finite number of discrete points. No error bound 
can be guaranteed outside the domain. Our algorithm is a generic procedure that turns an algorithm that can 
approximate functions over [— vr, vr] into one that can approximate our utility function n over [0, -|-oo), by 
utilizing the fact that lim^^^oo fJ-ix) = 0. 

Since lima;^oo = 0, for any e, there exist a point such that iJ,{x) < e Vx > Tg. Since we 
assume the utility function n is specified as a part of the problem but not a part of the input instance, is a 
constant for any constant e. We also assume there is an algorithm AP that, for any function / (under some 
conditions specified later), can produce an exponential sum f{x) = J2i=i '^«<^f which is an e-approximation 
of f{x) in [— vr, vr] such that \4>i\ < 1 and L depends only on e and /. In fact, we can assume w.l.o.g. that 
AP can approximate f(x) over [—B, B] for any B = 0(1). This is because we can apply AP to the scaled 
version g{x) = f{x ■ ^) (which is defined on [— tt, tt]) and then scale the obtained approximation g{x) back 
to [—B,B], i.e., the final approximation is f{x) = ■ x). Scaling a function by a constant factor ^ 
typically does not affect the smoothness of / in any essential way and we can still apply AP. Recall that 
our goal is to produce an exponential sum that is an e-approximation for //(x) in [0, +oo). We denote this 
procedure by ESUM. 
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Algorithm: ESUM 

1. Initially, we slightly change function n{x) to a new function ju(x) as follows: We require /x(x) is a 
"smooth " function in [-2r„ 2T^] such that Jl{x) = fi{x) for all x G [0, T,]; Jl{x) = for \x\ > 2T^. 
We choose in [~2Te,0] and [T^,2T^] such that is smooth. We do not specify the exact 
smoothness requirements now since they may depend on the choice of AP. Note that there may be 
many ways to interpolate n such that the above conditions are satisfied (see Example 1 below). The 
only properties we need are: (1) /x is amenable to algorithm AP; (2) — n{x)\ < e Vx > 0. 

2. We apply AP to /(x) = i]^'j2{x) over domain [—hT^, hT^] (r/ > 1 and > 2 are constants to be deter- 
mined later). Suppose the resulting exponential sum /(x) = J2i=i Ci4>f which is an e-approximation 

of /on [-hT„hT,]. 

3. Let Jl{x) = J2i=i ^ii^T' which is our final approximation of /x(x) on [0, oo). 

Example 1 Consider the utility function fi{x) = + 1). Let = ^ — 1. So /i(.x) < efor all x > T^. 
Now we create function /I(x) according to the first step o/ESUM. If we only require /i(x) to be continuous, 
then we can use, for instance, the following piecewise function: /2(x) = £ [0, re];/2(x) = ^ + 

|,x G [Tg, 2re]; /x(x) = 0, x > 2Te;/x(x) = — /x(x),x < 0. It is easy to see that Ji is continuous and 
e-approximates p,. □ 

By setting r] = 2 and 

^>log(Eii|cdA)^ (3) 

we can show the following theorem. 

Lemma 3 /I(x) is a 2e-approximation of p{x). 

Proof: We know that |/(x) - /(x)| < e for x G [0, HT^]. Therefore, we have that 

|«x)-«.)| = lf^-f^l<^<e. 

Combining with |/2(x) — p{x)\ < e, we obtain \ Ji{x) — fi{x)\ < 2e for x G [0, /iT^]. For x > hT^, we can 
see that 

\m\ = I E cAr\ < E wA^i < ^ E hi < ^ E hi < ^ 
i=i ' i=i ' 1=1 1=1 

Since p{x) < e for x > hT^, the proof is complete. □ 

Remark; Since we do not know Cj before applying AP, we need to set /i to be a constant (only depending 

on p and e) such that (3) is always satisfied. In particular, we need to provide an upper bound for X^.^^ \ci\. 
In the next subsection, we use the Fourier series decomposition as the choice for AP, which allows us to 
provide such a bound for a large class of functions. 
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2.3 A Particular Choice of AP: The Fourier Series Approach 

Now, we discuss the choice of algorithm AP and the conditions that f{x) needs to satisfy so that it is possible 
to approximate f{x) by a short exponential sum in a bounded interval. In fact, if we know in advance that 
there is a short exponential sum that can approximate /, we can use the algorithms developed in [10, 1 1] (for 
continuous case) and [9] (for discrete case). However, those works do not provide an easy characterization 
of the class of functions. From now on, we restrict ourselves to the classic Fourier series technique, which 
has been studied extensively and allows such characterizations. 

Consider the partial sum of the Fourier series of the function /(x): 

N 
k=-N 

where the Fourier coefficient = ^ f^^ /(.x)e^*'^'^da;. It has L = 2N + 1 terms. Since /(.x) is a real 
function, we have Ck = C-k and the partial sum is also real. We are interested in the question under which 
conditions does the function Sj^f converge to / (as N increases) and what is convergence rate? Roughly 
speaking, the more "smooth" / is, the faster Sjsif converges to /. In general, this question is extremely 
intricate and deep and is one of the central topics in the area of harmonic analysis. In the following, we give 
one classic result about the convergence of Fourier series and show how to use it in our problem. Then we 
provide a few concrete examples. 

We say / satisfies the a-//oWer con J/f/o?i if I /(,t) — /(y) I < C |x — for some constant C and a > 
and any x and y. The constant C is called the Holder coefficient of /, also denoted as \f\QO,<x. We say / is 
C-Lipschitz if / satisfies 1 -Holder condition with coefficient C. 

Example 2 It is easy to check that the utility function fi in Example 1 is 1-Lipschitz since |^^^ 
X '>Q. We can also see that (1) is ^-Lipschitz- 

We need the following classic result of Jackson. 

Theorem 4 (See e.g., [47]) If f satisfies the a-Holder condition, it holds that 

l/(x)-{s./)Wi<o(M^?^). 

For later development, we need a few simple lemmas. The proofs of these lemmas are straightforward 
and thus omitted here. 

Lemma 4 Suppose f : [a, c] ^ M. is a continuous function which consists of two pieces fi : [a, 5] — >■ M 
and /2 : [b, c] W. If both fi and /2 satisfy the a-Holder condition with Holder coefficient C, then 
|/Ic0'« < 2C. 

Lemma 5 Suppose / : [a, c] — > M is a continuous function satisfying the a-Holder condition with Holder 
coefficient C. Then, for g{x) = f{hx) for some constant h, we have l^l^o," < Ch!^. 

Using Theorem 4 and Lemma 5, we obtain the following corollary. 

Corollary 1 Suppose f € C^[—hTe, hT^] satisfies the a-Holder condition with |/|co." = 0{1) and N = 
0{hT,{Uogiy/'^). Then, it holds that \f{x) - {SNf){x)\ <eforxe [-hT„hT,]. 
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Everything is in place to prove Theorem 2. Consider the algorithm AP. If /Lt is a-H6lder with coefficient 
0(1), we can construct fi which is also ct-Holder with coefficient 0(1), by Lemma 4. Then, we can easily 
see that f(x) = r]^'jl{x) is also a-H6lder with coefficient 0(1) in [— /iTg, /iTg] for any rj = 2. Hence, we 
can apply Corollary 1 . By Lemma 3, we complete the proof of Theorem 2. 

How to Choose h : Now, we discuss the issue left in Section 2.2, that is how to choose h (the value should 
be independent of qs and L) to satisfy (3), when /i satisfies the a-Holder condition for some a > 1/2. 
Indeed, we can choose h = 0{jr log ^). See Appendix D for the details. 

3 Applications 

We first consider two utiUty functions x(x) and x(x) presented in the introduction. Note that maximizing 
E[x(u;(S'))] is equivalent to maximizing Fv{w{S) < 1). The following lemma is straightforward. 

Lemma 6 For any solution S, 

Pr{w{S) < 1) < E[x{w{S))] < Pv{w{S) <l + d). 

Corollary 2 Suppose there is a pseudopolynomial time algorithm for the exact version of A. Then, for any 
fixed constants e > and 5 > 0, there is an algorithm that runs in time and produces a 

solution S & such that 

Pi{w{S) <l + 6) + e> maxPr(u;(5') < 1) 

Proof: By Theorem 1, Theorem 2 and Lemma 6, we can easily obtain the corollary. Note that we can choose 
= 2 for any e > 0. Thus h = 0(log ^) and L = 0(^ log I). □ 

Now, let us see some applications of our general results to specific problems. 

Stochastic Shortest Path : Finding a path with the exact target length (we allow non-simple paths)^ can be 
easily done in pseudopolynomial time by dynamic programming. Therefore, as discussed in Section 1.1, 
Corollary 2 generalizes several results for stochastic shortest path in prior work [43, 41]. 

Stochastic Spamiing Tree: Our objective is to find a spanning tree T in the given probabilistic graph such 
that Pr(u;(r) < 1) is maximized. Polynomial time algorithms have been developed for Gaussian distributed 

edges [30, 23] . To the best of our knowledge, no approximation algorithm with provable guarantee is known 
for other distributions. Noticing there exists a pseudopolynomial time algorithm for the exact spanning tree 
problem [6], we can directly apply Corollary 2. 

Stochastic fc-Median on Trees : The problem asks for a set S of k nodes in the given probabilistic tree G 
such that Pr(X^„£y(G;) dis('U, S) < 1) is maximized, where d\s{v, S) is the minimum distance from v to any 
node in S in the tree metric. The fc-median problem can be solved optimally in polynomial time on trees by 
dynamic programming [31]. In fact, we can easily modify the dynamic program to get a pseudopolynomial 
time algorithm for the exact version. We omit the details. 

Stochastic Knapsack with Random Sizes: We are given a set Uofn items. Each item i has a random size 
Wi and a deterministic profit Vi. We are also given a positive constant < 7 < 1. The goal is to find a subset 
S such that Fv{w{S) < 1) > 7 and the total profit v{S) = J2ieS '^i maximized. 

'The exact version of simple path is NP-hard, since it includes the Hamiltonian path problem as a special case. 
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If the profits of the items are polynomially bounded integers, we can see the optimal profit is also 
a polynomially bounded integer. We can first guess the optimal profit. For each guess g, we solve the 
following problem: find a subset S of items such that the total profit of S is exactly g and E[x(u)(5))] is 
maximized. The exact version of the deterministic problem is to find a solution S with a given total size 
and a given total profit, which can be easily solved in pseudopolynomial time by dynamic programming. 
Therefore, by Corollary 2, we can easily show that we can find in polynomial time a set S of items such that 
the total profit v{S) is at least the optimum and Vx{w{S) < 1 + e) > (1 — e)7 for any constant e and 7. 

If the profits are general integers, we can use the standard scaUng technique to get a (1— €)-approximation 
for the total profit. See Appendix E for the details. In sum, we have obtained the following result. 

Theorem 5 For any constants e > and 7 > 0, there is a polynomial time algorithm to compute a set S of 
items such that the total profit v{S) is within a 1 — e factor of the optimum and 'Pv(w{S) < 1 + e) > (1 — 6)7. 

Recently, Bhalgat et al. [13, Theorem 8.1] obtained the same result, with a running time ^2^°'^^^/^^ ^ while 
our running time is [^fijz^^sD ^ nPoly(i/e)^ 

Moreover, we can easily extend our algorithm to generalizations of the knapsack problem if the corre- 
sponding exact version has a pseudopolynomial time algorithm. For example, we can get the same result for 
the partial-ordered knapsack problem with tree constraints [22, 49] . In this problem, items must be chosen 
in accordance with specified precedence constraints and these precedence constraints form a partial order 
and the underhning undirected graph is a tree (or forest). A pseudopolynomial algorithm for this problem is 
presented in [49] . 

Stochastic Knapsack with Random Profits: We are given a set U oi n items. Each item i has a deter- 
ministic size Wi and a random profit Vi. The goal is to find a subset of items that can be packed into a 
knapsack with capacity 1 and the probability that the profit is at least a given threshold T is maximized. 
Henig [29] and Carraway et al. [14] studied this problem for normally distributed profits and presented 
dynamic programming and branch and bound heuristics to solve this problem optimally. 

We can solve the equivalent problem of minimizing the probability that the profit is at most the given 
threshold. It is straightforward to modify our algorithm to work for the minimization problem and we can 
also get an e additive error for any e > 0. In fact, we can show that violation of the capacity constraint is 
necessary unless P = NP. See Appendix F for the details. 

Theorem 6 If the optimal probability is 0(1), we can find in polynomial time a subset S of items such that 
Pr(t;(S') > (1 - e)T) > (1 - e)OPT and w{S) <1 + e,for any constant e > 0. 

4 Extensions 

In this section, we discuss some extensions to our basic approximation scheme. We first consider optimizing 
a constant number of utility functions in Section 4.1. Then, we study the problem where the weight of each 
element is a random vector in Section 4.2. 

4.1 IMultiple Utility Functions 

The problem we study in this section contains a set U of n elements. Each element e has a random weight 
We- We are also given d utiUty functions //i, . . . , and d positive numbers Ai, . . . , A(j. We assume d is a 
constant. A feasible solution consists of d subsets of elements that satisfy some property. Our objective is 
to find a feasible solution Si,. . . ,Sd such that E[fii{w{Si))] > Aj for all 1 < i < d. 
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We can easily extend our basic approximation scheme to the multiple utility functions case as follows. 
We decompose these utility functions into short exponential sums using ESUM as before. Then, for each 
utility function, we maintain (n/e)'^^^) configurations. Therefore, we have {n/ e)'^'^'^^^ configurations in 
total and we would like to compute the values for these configurations. We denote the deterministic version 
of the problem under consideration by A. The exact version of A asks for a feasible solution Si, . . . ,S(i 
such that the total weight of Si is exactly the given number ti for all i. Following an argument similar to 
Lemma 2, we can easily get the following generalization of Theorem 1 . 

Theorem 7 Assume that there is a pseudopolynomial algorithm for the exact version of A. Further assume 
that given any e > 0, we can e-approximate each utility function by an exponential sum with at most L 
terms. Then, there is an algorithm that runs in time (n/e)'-^^^^^ and finds a feasible solution Si, . . . ,S(i such 
that K[iJ,i{w{Si)] > Xi — efor 1 < i < d, if there is a feasible solution for the original problem. 

Now let us consider two simple applications of the above theorem. 

Stochastic Multiple Knapsack: In this problem we are given a set Uofn items, d knapsacks with capacity 
1, and d, constants < 7^ < 1. We assume d is a constant. Each item i has a random size wi and a 
deterministic profit Vi. Our objective is to find d disjoint subsets Si, . . . ,Sd such that PT{w{Si) < 1) > 7,: 
for all 1 < i < d and Yli=i v{Si) is maximized. The exact version of the problem is to find a packing such 
that the load of each knapsack i is exactly the given value U. It is not hard to show this problem can be 
solved in pseudopolynomial time by standard dynamic programming. If the profits are general integers, we 
also need the scaUng technique as in stochastic knapsack with random sizes (Appendix E). In sum, we can 
get the following generalization of Theorem 5. 

Theorem 8 For any constants d € N, e > and < 7t < 1 for 1 < i < d, there is a polynomial time 
algorithm to compute d disjoint subsets Si, . . . , Sd such that the total profit Yli=i ^(•S'i) is within a 1 — e 
factor of the optimum and Pr(w(5i) < 1 + e) > (1 — e)^ifor 1 < i < d. 

Stochastic Multidimensional Knapsack: In this problem we are given a set U of n items and a constant 
< 7 < 1. Each item i has a deterministic profit Vi and a random size which is a random d-dimensional 
vector Wj = {wn, . . . , Wid}- We assume d is a constant. Our objective is to find a subset S of items such 
thatPr(A?^,(E i^s '^ij < 1)) > 7 and total profit is maximized. This problem can be also thought as the 
fixed set version of the stochastic packing problem considered in [19, 13]. We first assume the components 
of each size vector are independent. The correlated case will be addressed in the next subsection. 

For ease of presentation, we assume d = 2 from now on. Extension to general constant d is straight- 
forward. We can solve the problem by casting it into a multiple utility problem as follows. For each item 
i, we create two copies ii and 12. The copy ij has a random weight wij. A feasible solution consists of 
two sets 5i and ^2 such that 5*1 {S2) only contains the first (second) copies of the elements and Si and 
S2 correspond to exactly the same subset of original elements. We enumerate all such pairs (71, 72) such 
that 7i72 > 7 and 7^ G [7, 1] is a power of 1 — e for ? = 1, 2. Clearly, there are a polynomial number 
of such pairs. For each pair (71, 72), we solve the following problem: find a feasible solution Si, S2 such 
that Pr(X]jg5^ Wij < 1) > 7j for all j = 1,2 and total profit is maximized. Using the scaling technique 
and Theorem 7 for optimizing multiple utility functions, we can get a (1 — e) -approximation for the optimal 

profit and Pr(A?=i(Eie5, < 1)) = UU Pr(EieS, mj <!)>(!- 0(e))7i72 > (1 - 0{e))j. 

We note that the same result for independent components can be also obtained by using the discretization 
technique developed for the adaptive version of the problem in [13] ^. If the components of each size vector 

*With some changes of the discretization technique, the correlated case can be also handled [12], 



13 



are correlated, we can not decompose the problem into two 1 -dimensional utilities as in the independent 
case. Now, we introduce a new technique to handle the correlated case. 

4.2 Multidimensional Weight 

The general problem we study contains a set Uofn elements. Each element e has a random weight vector 
Wi = {wii, . . . , Wid). We assume d is a constant. We are also given a utility functions /j, : R'^ ^ M"*". A 
feasible solution is a subset of elements satisfying some property. We use w{S) as a shorthand notation for 
vector {J2i&s '"^ii' ■ • • ' YlieS '^id)- Our objective is to find a feasible solution S such that E[/ij(i(;(S')] is 
maximized. 

From now on, x and k denote d-dimensional vectors and kx (or k-x) denotes the inner product of k and x. 
As before, we assume fi{x) G [0, 1] for all a; > and lim|j.|_j.+(^ = 0, where |a;| = max(a;i, . . . , Xd), 
Our algorithm is almost the same as in the one dimension case and we briefly sketch it here. We first notice 
that expected utilities decompose for exponential utility functions, i.e., IE[^*^'"'('^)] = Hies ^I'?^'^ '^'l- Then, 
we attempt to e-approximate the utility function //(x) by a short exponential sum Yl\k\<N '^k(t>k^ (there 
are OiN'^) terms). If this can be done, E[^'= "'('^)] can be approximated by T.\k\<N CfeE[0''-™('5)]. Using 
the same argument as in Theorem 1 , we can show that there is a polynomial time algorithm that can find a 
feasible solution S with E[fj,{w{S))] > OPT — e for any e > 0, provided that a pseudopolynomial algorithm 
exists for the exact version of the deterministic problem. 

To approximate the utility function ij,{x), we need the multidimensional Fourier series expansion of 
a function / : — )■ C (assuming / is 27r-periodic in each axis): f{x) ~ X^fegz^ CfcC*'^^ where Ck = 
Ixel-iT 7r]<^ f{x)e~^^^ dx. The rectangular partial sum is defined to be 

SNf{x)= H 
\ki\<N \kd\<N 

It is known that the rectangular partial sum SNf{x) converges uniformly to f{x) in [— tt, tt]*^ for many 
function classes as n tends to infinity. In fact, a generalization of Theorem 4 to [— tt, tt]"^ also holds [4]: If / 
satisfies the a-Holder condition, then 

\f{x) - {Sr,f){x)\ < O(^fe^) forx G 

Now, we have an algorithm AP that can approximate a function in a bounded domain. It is also straightfor- 
ward to extend ESUM to the multidimensional case. Hence, we can e-approximate /x by a short exponential 
sum in [0, +ooY, thereby proving the multidimensional generahzation of Theorem 2. Let us consider an 
application of our result. 

Stochastic Multidimensional Knapsack (Revisited): We consider the case where the components of each 
weight vector can be correlated. Note that the utiUty function X2 corresponding to this problem is the two 

dimensional threshold function: X2{x,y) = 1 if x < 1 and y < I', X2{x,y) = otherwise. As in the one 
dimensional case, we need to consider a continuous version X2 of X2 (see Figure 1(3)). By the result in this 
section and a generalization of Lemma 6 to higher dimension, we can get the following. 

Theorem 9 For any constants d G N, e > and < 7 < 1, there is a polynomial time algorithm for finding 
a set S of items such that the total profit v{S) is 1 — e factor of the optimum and Pr(A^=i(Sie5 — 
1 + €))>(! -e)7. 
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5 Discussions 



Convergence of Fourier series: The convergence of the Fourier series of a function is a classic topic in har- 
monic analysis. Whether the Fourier series converges to the given function and the rate of the convergence 
typically depends on a variety of smoothness condition of the function. We refer the readers to [52] for a 
more comprehensive treatment of this topic. We note that we could obtain a smoother version of x (e-g-» see 
Figure 1(2)), instead of the piecewise linear x, and then use Theorem 4 to obtain a better bound for L. This 
would result in an even better running time. Our choice is simply for the ease of presentation. 

Discontinuous utility functions: If the utility function /x is discontinuous, e.g., the threshold function, then 
the partial Fourier series behaves poorly around the discontinuity (this is known as the Gibbs phenomenon). 
However, informally speaking, as the number of Fourier terms increases, the poorly-behaved strip around 
the edge becomes narrower. Therefore, if the majority of the probabihty mass of our solution lies outside 
the strip, we can still guarantee a good approximation of the expected utility. There are also techniques to 
reduce the effects of the Gibbs phenomenon (See e.g., [25]). We leave the problem of directly dealing with 
discontinuous utility functions, especially the threshold function, to obtain a true approximation (instead of 
a bi-criterion approximation) as an interesting open problem. 

6 Conclusion 

We consider the problem of maximizing expected utility for many stochastic combinatorial problems, such 
as shortest path, spanning tree and knapsack. We develop a polynomial time approximation scheme with 
additive error e for any e > 0. A key ingredient in our algorithm is to decompose the utility function into 
a short exponential sum. In this paper, we use the Fourier series technique to fulfill the task. Exploring 
other decomposition approaches is an interesting future work. Our general approximation framework may 
be useful for other stochastic optimization problems. One major open problem is to obtain approximations 
with reasonable multiplicative factors, or nontrivial inapproximability results, for the utility maximization 
problem. 
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A St. Petersburg Paradox 



In this section, we briefly describe the St. Petersburg paradox. The paradox is named from Daniel Bernoulli's 
presentation of the problem, published in 1738 in the Commentaries of the Imperial Academy of Science 
of Saint Petersburg. Consider the following game: you pay a fixed fee X to enter the game. In the game, a 
fair coin is tossed repeatedly until a tail appears ending the game. The payoff of the game is 2^ where k is 
the number of heads that appears., i.e., you win 1 dollar if a tail appears on the first toss, 2 dollars if a head 
appears on the first toss and a tail on the second, 4 dollars if a head appears on the first two tosses and a tail 
on the third and so on. The question is what would be a fair fee X to enter the game? First, it is easy to see 
that the expected payoff is 

E[payoff] = l.l + 1.2 + 1.4+1.8 + ... = l + l + l + l + ... = f;l = oc 

k=l 

If we use the expected payoff as a criterion for decision making, we should therefore play the game at any 
finite price X (no matter how large X is) since the expected payoff is always larger. However, researchers 
have done extensive survey and found that not many people would pay even 25 dollars to play the game 
[37], which significantly deviates from what the expected value criterion predicts. In fact, the paradox can 
be resolved by the expected utiMty theory with a logarithmic utiUty function, suggested by Bernoulli himself. 
We refer interested reader to [37, 1] for more information. 



B Missing Proofs 

Proof of Lemma 1: We first notice that 



k=l k=l 



Therefore, it suffices to show that for all A; = 1, . . . , L, 



L 



First, we can see that 

arg(E[</.f ^)]) - f3k5 = ^(arg(E[0-^]) - h{e)6) < < 

If YleeS '^k{'^) > we know that 

-ln(|E[<^^(^)]|) = J](-ln(|E[</,-«|)) > J7. 



e 

T 



In this case, we have afe = J. Thus, we have 



|E[</.; 



-afc7 



ee5 
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If YleeS ^ki^) — t^^t 

'y = 



ee5 eeS 

Since the derivative of is less than 1 for a; < 0, we can get 

MS) 



L 



For any two complex numbers a, 6 with |a I < land|6| < 1, if | |a| — 16| | < /i and | arg(a) — arg(6)| < h, 
we can easily show that \a — b\ < 0{h). The proof is complete. □ 

Proof of Lemma 2: For each element e, we associate a new vector ag = (ai, 61, . . . , a^, If ai{e) > J, 
we let ai{e) = n{J + 1) and Oj(e) = ai(e) otherwise. Let bi{e) = bi{e) for all e and f. For each node v 
and each vector a = (ai, . . . , ai,, such that < < n^(J + l)Vi and < < K^i, we want to 
compute the value a^(a) which is defined as follows: a^(a) = 1 if and only if there is a feasible solution 
S e T such that for all j = 1,. . . ,L, Pj = Yle^s^ji^)' ™^ ~ See5%(^) "^ore compactly, 
a = X]ee5 ^e) ; o-„(a) = otherwise. 

We can encode each vector as a nonnegative integer upper bounded by {n^JK)^ = (^)'^(^). Then, 
determining the value of a configuration is equivalent to determining whether there is a feasible solution S 
such that the total weight of S is exactly a given value. Suppose the pseudopolynomial time algorithm for 
the exact version of A runs in time PA{n, t) for some polynomial Pa- Therefore, the value of each such 
at,(a) can be also computed in time PA{n, (f )'^*^^^) = (l)'^(^). Since J and K are bounded by (f )'^'^^\ 
the number of configuration is The value of c7((ai, /3i, . . . ,aL, Pl)) can be easily answered from 

the values of as as follows : 

1. If < J Vz, (Tt,(a) = cj^(a); 

2. Denote a' = {a[, . . . , ct'^, and S = {i \ ai = J}, cr^(a) = maxa'(a^(a') | = /3j Vi, a[ > 

jyi € 5, a[ = aiii ^ S). 

The total running time is {^)OiL) x (^^)O(^) = (r^jOC^-). □ 



C Computing E[a"^ ] 

If X is a random variable, then the characteristic function of X is defined as 

=E[e'^^]. 

We can see E is nothing but the value of the characteristic function of We evaluated at —i In a (here In 
is the complex logarithm function). For many important distributions, including negative binomial, Poisson, 
exponential, Gaussian, Chi-square and Gamma, a closed-form characteristic function is known. See [44] for 
a more comprehensive fist. 

Example 3 Consider the Poisson distributed Wf, with mean A, i.e., Pr(u'e = k) = X^e^'^ /k\ . Its charac- 
teristic function is known to be G{z) =e^^^'^~^^. Therefore, 

E[a'"=] = G(-ilna) =e^("-i). 
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Example 4 For Gaussian distribution N{fi, o"^), we know its characteristic function is G{z) = e'^^ 2'^ ^ . 
Therefore, 

E[a"'^] = G(-ilna) = 

For some continuous distributions, no closed-form characteristic function is known and we need proper 
numerical approximation method. 

If the support of the distribution is bounded, we can use for example Gauss-Legendre quadrature [48]. If 
the support is infinite, we can truncate the distribution and approximate the integral over the remaining finite 



Xi 



interval; Generally speaking a quadrature method approximates f(x)dx by a linear sum ^^^i Cif{ 
where Cj and Xi are some constants independent of the function /. A typical practice is to use composite rule, 
that is to partition [a, h\ into N subintervals and approximate the integral using some quadrature formula over 
each subinterval. For the example of Gauss-Laguerre quadrature, assuming continuity of the 2A;th derivative 
of f{x) for some constant k, if we partition [a, h\ into M subintervals and apply Gauss-Legendre quadrature 
of degree k to each subinterval, the approximation error is 

Error = ^' ' ^ 

where ^ is some point in (a, h) [48, pp.116]. Let A = If we treat A; as a constant, the behavior of the 
error (in terms of A) is £?rror(A) = ©(A^'^max^ f^^'^\0)- Therefore, if the support and max^ /^^'^HO 
are bounded by a polynomial, we can approximate the integral, in polynomial time, such that the error is 
0(l/n''') for any fixed integer /3. 

The next lemma shows that we do not lose too much even though we can only get an approximation of 

Lemma 7 Suppose in Theorem 3, we can only compute an approximate value of^[4>^''], denoted by E^^i, 
for each e and i, such that |E[0^^] — Ef>^i\ < 0{n~^) for some positive integer {3. Denote E{S) = 
^k=-i Cfc rieeS f^or any solution S, we have that 

\E[il{w{S))]-E{S)\<0{n'-^). 

Proof: We need the following simple result (see [34] for a proof): ai, . . . , a„ and ei, . . . , are complex 
numbers such that |aj| < 1 and |ej| < n~l^ for all i and some /5 > 1. Then, we have 

n n 

\ll{ai + ei)-l[Ei\<0{n'-f^). 
i=l i=l 

Since < 1, we can see that 

|E[C1I = I / 'PtPe{x)dx\<l. 
Jx>0 

The lemma simply follows by applying the above result and noticing that L and all c^s are constants. □ 

We can show that Theorem 1 still holds even though we only have the approximations of the E[a"'''] 
values. The proof is straightforward and omitted. 
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D How to Choose h 



In this section, we discuss the issue left in Section 2.2, that is how to choose h (the value should be indepen- 
dent of CiS and L) to satisfy (3), when /x satisfies the a-Holder condition for some a > 1/2. 

We need the following results about the absolute convergence of Fourier coefficients. If / satisfies the 
a-Holder condition for some a > 1/2, then J2t^oo — \ f\c°-" ' ^o. where only depends on a [52]. 

Suppose the original utility function satisfies the a-H61der condition with coefficient C, for some 
a > 1/2. Now, we apply ESUM to /it. By Lemma 4, we know that the piecewise function n satisfies 
Q-Holder condition with coefficient 2C. Therefore, we can easily see that /(.x) = fi{x)ri^ satisfies a- 
Holder condition with coefficient at most 2^+^^'^C on [— /iT^, hT^] (This is because /I is non-zero only in 
[-2T^, 2T^]). According to Lemma 5, we have |/(x • 'g^)|co,c < 2i+2^<=('gf)«C. 

Therefore, it is sufficient to set value h such that 

hT, > log = 2T, + 0(log(^)) . 

We can easily verify that we can satisfy the above condition by letting /i = log ^). 



E Details for Stochastic Knapsack with Random Sizes 

We first make a guess of the optimal profit, rounded down to the nearest power of (1 + e). There are at 
most logi+g guesses. For each guess g, we solve the following problem. We discard all items with 

a profit larger than g. Let A = For each item with a profit smaller than ^, we set its new profit to be 
Vi = 0. Then, we scale each of the rest profits to = A[^J . Now, we define the feasible set 

Ha) = {S I ^(1 - 2e)g <Y,Vi<{l + 2e)g}. 

ieS ieS 

Since there are at most ^ distinct v values, we can easily show that finding a solution S in J^{g) with a 
given total size can be solved in pseudopolynomial time by dynamic programming. 

Denote the optimal solution by S* and the optimal profit by OPT. Suppose g is the right guess, i.e., 
(lii)O-Pr <g< OPT. We can easily see that for any solution S, we have that 

ies ies ies 

where the first inequalities are due to Vi > ^ and we set at most eg profit to zero. Therefore, we can see 
S* e J='{g). Applying Corollary 2, we obtain a solution S such that Pi{w{S) < 1 + 5) + e > Pv{w{S*) < 
1 + S). Moreover, the profit of this solution v{S) = Y,ieS ""i ^ T,ieS Vi>{l- 2e)g > (1 - 0{e))OPT. 



F Details for Stochastic Knapsack with Random Profits 

We first show that the 1 + e relaxation of the capacity constraint is necessary. Consider the following 
knapsack instance. The profit of each item is the same as its size. The given threshold is 1. We can see that 
the optimal probabiUty is 1 if and only if there is a subset of items of total size exactly 1. Otherwise, the 
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optimal probability is 0. Therefore, it is NP-hard to approximate the original problem within any additive 
error less than 1 without violating the capacity constraint. 

The corresponding exact version of the deterministic problem is to find a set of items S such that w{S) < 
1 and v{S) is equal to a given target value. In fact, there is no pseudopolynomial time algorithm for this 
problem. Since otherwise we can get an e additive approximation without violating the capacity constraint, 
contradicting the lower bound argument. Note that a pseudopolynomial time algorithm here should run in 
time polynomial in the profit value(not the size). However, if the sizes can be encoded in 0(log n) bits (we 
only have a polynomial number of different sizes), we can solve the problem in time polynomial in n and 
the largest profit value by standard dynamic programming. 

For general sizes, we can round the size of each item down to the nearest multiple of -. Then, we can 
solve the exact version in pseudopolynomial time by dynamic programming. It is easy to show that for any 
subset of items, its total size is at most the total rounded size plus e. Therefore, the total size of our solution 
is at most 1 + e. 
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